Abstract
Background: Statin therapy, despite proven cardiovascular benefits, remains underused. Social media platforms may capture patient perspectives that are less visible in clinical encounters.
Objective: This study aimed to characterize themes, sentiment, and decision-making factors related to statin therapy through large language model (LLM)–based analysis of Reddit discussions.
Methods: This cross-sectional observational study analyzed English-language Reddit posts and comments mentioning statins from January 2022 to May 2025, identified via keyword-based Reddit application programming interface searches (≤1000 posts per keyword). A total of 5328 retrieved discussions (n=1661, 31.2% posts and n=3667, 68.8% keyword-containing comments) from public subreddits were included. Themes, sentiments (positive, neutral, or negative), guideline-informed clinical relevance, information-seeking behavior, adverse effect mentions, decision factors, and adherence-related content were extracted using an LLM-based pipeline.
Results: Among 5328 discussions, prominent topics included adverse effects (n=1697, 31.9%), decision-making references related to laboratory results and physician advice (n=2767, 51.9% and n=2034, 38.2%, respectively), and alternative approaches (n=2485, 46.6%). Overall sentiment was neutral in 34% (n=1812) of discussions, negative in 30.9% (n=1646), and positive in 16.9% (n=900); the remainder were mixed or unclear. Statin-directed sentiment was neutral in 44.1% (n=2350) of discussions, negative in 25.2% (n=1343), and positive in 12.5% (n=666); the remainder did not express statin-directed sentiment. High clinical relevance was identified in 12.6% (n=672) of discussions. Adherence-related issues were mentioned in 29.8% (n=1587) of discussions. Among adverse effect mentions, muscle pain (n=129, 7.6%) and fatigue (n=110, 6.5%) were common.
Conclusions: LLM-enabled analysis of Reddit discourse highlights substantial negative sentiment, adherence-related concerns, and adverse effect narratives surrounding statin therapy. These findings suggest opportunities for patient-centered communication and shared decision-making strategies that address symptom attribution, uncertainty, and information needs in digital information environments.
doi:10.2196/85057
Keywords
Introduction
Statins are among the most widely prescribed medications globally, with more than 200 million individuals using these agents to reduce cardiovascular morbidity and mortality []. These 3-hydroxy-3-methylglutaryl coenzyme A reductase inhibitors remain the cornerstone of atherosclerotic cardiovascular disease (ASCVD) prevention, demonstrating well-established efficacy in both primary and secondary prevention settings [,]. Despite their proven benefits, real-world statin therapy is frequently limited by concerns regarding safety and tolerability. In routine care, statin-related events are commonly documented, with myalgia and myopathy representing the most frequent category, and these events often precipitate temporary discontinuation and subsequent rechallenge []. Consequently, long-term use remains suboptimal; a recent systematic review and meta-analysis found that only approximately 62% of patients achieve good adherence (≥80% use) over a median follow-up of 24 months [], while population-based primary care data show frequent discontinuation with substantial restarting, consistent with an intermittent use pattern in practice [].
Traditional clinical research methods, including randomized controlled trials and observational studies, may not fully capture the breadth of patient experiences and concerns in routine care []. Effective communication between patients and health care providers is essential for shared decision-making and treatment adherence; however, barriers such as limited consultation time, health literacy challenges, and patient reluctance to disclose adverse experiences may constrain insight into the real-world challenges of medication use [,]. Patients may hesitate to report symptoms or concerns to clinicians due to fear of judgment, desire to avoid conflict with prescribers, or uncertainty about whether their experiences warrant clinical attention []. Social media platforms, especially Reddit (Reddit, Inc), provide access to unfiltered patient narratives that may not be expressed during clinical encounters [-]. Reddit hosts a wide range of health-related communities and has more than 50 million daily active users []. Within these forums, individuals share personal medical experiences, seek peer advice, and discuss treatment decisions. These discussions may provide valuable insights into patient beliefs, motivations, and barriers that influence medication adherence and cardiovascular risk management [-].
Recent advances in large language models (LLMs) offer scalable methods to analyze such unstructured, high-volume textual data, enabling systematic characterization of patient perspectives at a granularity that would be infeasible using manual qualitative approaches alone []. In the context of cardiovascular prevention, applying LLMs to social media discourse may provide novel insights into how patients interpret statin-related information, evaluate perceived risks and benefits, and make adherence decisions outside the clinical setting. Accordingly, this study aimed to systematically analyze Reddit discussions related to statin use using an LLM to characterize patient-reported experiences, identify recurrent concerns and misconceptions, and explore factors influencing decision-making and adherence. By integrating patient-generated narratives with computational text analysis, this work seeks to complement traditional evidence sources and inform more patient-centered approaches to cardiovascular risk management.
Methods
Study Design and Data Source
This cross-sectional study analyzed publicly available Reddit discussions about statin therapy posted between January 1, 2022, and May 1, 2025. Reddit was selected because it hosts large, topic-specific health communities in which users openly share medication experiences, treatment decisions, and interactions with health care providers in naturalistic settings. Data were accessed via the official Reddit application programming interface (API) using Python Reddit API Wrapper, limited to public subreddits, and no user contact or reidentification attempts were made.
Data Collection
Search Strategy
We retrieved content using “statin,” generic statin names (eg, atorvastatin, rosuvastatin, simvastatin, pravastatin, lovastatin, fluvastatin, and pitavastatin), and US brand names (eg, Lipitor, Crestor, Zocor, Pravachol, Livalo, Lescol, and Mevacor); full queries are provided in .
API Procedure
Using Python Reddit API Wrapper (version 7.7.1) in Python (version 3.9; Python Software Foundation; May 15-May 20, 2025), we queried Reddit’s search end point (sorted by “new” where available). Due to indexing and API constraints, each keyword returned up to 1000 submissions. For each submission, we extracted metadata (submission ID, created_utc, subreddit, title, selftext, author, score, comment count, and permalink) and downloaded its comment tree (comment ID, parent ID, created_utc, author, body, and score). We retained submissions within the study window; pagination, time stamps, and rate-limit handling are described in .
Eligibility
We included English-language posts and comments from public subreddits that matched 1 or more keywords (case-insensitive) within the study window and excluded removed and deleted placeholders, duplicates, non-English text (langdetect v1.0.9), and promotional and spam content ().
Unit of Analysis
The unit of analysis was a Reddit discussion thread (eligible submission plus associated comments), enabling contextual interpretation of patient narratives and peer responses and supporting downstream LLM-based thematic extraction [].
Data Cleaning and Preprocessing
We removed URLs and normalized whitespace while preserving the original wording (no stemming or lemmatization). Privacy protection included removing @mentions and replacing detected email addresses and phone numbers with placeholders (eg, “[EMAIL],” “[PHONE]”); subreddit names were retained as public metadata, with reidentification risk minimized via aggregate reporting and avoidance of traceable quotations []. English language posts were identified using fastText (lid.176.bin) [] (retain ≥0.80, adjudicate 0.60-0.80 by 2 investigators, and exclude <0.60). For analysis, comments were retained if they contained 1 or more keywords; misspellings and variants were captured via regex-based partial matching with guardrails. Exact duplicates were removed via SHA-256 hashing, and near duplicates were excluded using MinHash with Jaccard similarity ≥0.85 []. All steps were deterministic and version controlled; a PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses)–style flow diagram is provided in .
LLM-Based Content Extraction
Structured extraction used GPT-4.1 [] via the OpenAI API (version 1.52.0) between May and June 2025 with standardized parameters (temperature=0.1, top_p=1.0, and max_tokens=4096). A single prompt template and predefined JSON schema were applied to all discussions. Two investigators iteratively developed the schema and decision rules through pilot testing on 50 purposively sampled discussions and consensus refinement of edge cases, consistent with clinical reasoning standards [,]. For 11 analytic domains (), outputs included structured variables, brief summaries, and verbatim evidence quotes supporting key classifications. Dietary changes and exercise were coded as alternative approaches only when explicitly discussed as substitutes for statin therapy (eg, attempting to avoid, delay, or discontinue statins); when described as adjuncts alongside statins, they were not classified as alternatives. Outputs failing JSON validation or missing mandatory evidence were automatically requeried up to 3 times; only schema-valid outputs were retained ().
| Domain | Description | Example extractions |
| Primary themes | 1 to 3 main discussion topics |
|
| Statin medications | Specific statin names mentioned |
|
| Experience type | Postclassification |
|
| Sentiment analysis | Multitarget emotional assessment |
|
| Clinical relevance | Actionable insight assessment based on AHA and ACC guidelines |
|
| Adverse effects | Adverse reactions mentioned |
|
| Decision factors | Treatment choice influences |
|
| Information seeking | Behavioral patterns |
|
| Adherence issues | Medication compliance |
|
| Alternative treatments | Nonstatin options |
|
| Emotional indicators | Expressed emotions |
|
aAHA: American Heart Association.
bACC: American College of Cardiology.
cASCVD: atherosclerotic cardiovascular disease.
dMI: myocardial infarction.
ePAD: peripheral artery disease.
fLDL-C: low-density lipoprotein cholesterol.
gCAC: coronary artery calcium.
All content was analyzed as user-generated narratives. References to clinician recommendations within discussions (eg, “my doctor recommended starting a statin”) were coded as patient-reported physician communication rather than independently verified clinical input. Posts containing general medical guidance without a described patient-clinician interaction were not coded under the “decision factors” physician recommendation category. Clinician perspectives were not independently ascertained; physician-related content was captured only through users’ own accounts of clinical interactions.
Clinical Relevance Tiering
A deterministic rule-based algorithm aligned with American Heart Association, American College of Cardiology, and US Preventive Services Task Force indicators [,] assigned high, medium, or low clinical relevance (). High relevance required explicit mention of ASCVD, low-density lipoprotein cholesterol ≥190 mg/dL, diabetes (in individuals aged 40‐75 years), or elevated 10-year ASCVD risk, and medium relevance reflected risk-enhancing factors (eg, family history of premature ASCVD, chronic kidney disease, inflammatory conditions, and coronary artery calcium ≥100 Agatston units) [,]; otherwise, discussions were low relevance. Given self-reported data, absence of criteria was coded as “unreported,” and tier assignment defaulted downward unless explicit evidence supported a higher tier.
Detection of LLM-Related Content
We searched for mentions of LLM tools (eg, ChatGPT [OpenAI], Claude [Anthropic], and Google Gemini) and categorized them as (1) seeking medical advice, (2) seeking general health information, (3) sharing LLM-generated content, or (4) incidental mentions. Categories 1 to 3 were flagged for sensitivity analyses ().
Validation
LLM outputs were validated via expert review (). A stratified random sample of 50 discussions (0.9% of 5328) spanning key domains was independently assessed by 2 reviewers (a clinical informaticist and a physician) using a 5-point accuracy scale (1=very poor to 5=excellent). Interrater agreement used Cohen κ. Reviewers also documented recurrent errors. Verbatim evidence requirements enabled direct verification of extracted labels.
Statistical Analysis
Multivariable logistic regression identified factors associated with negative sentiment toward statins. Cluster-robust SEs were estimated at the subreddit level. Covariates were prespecified (clinical relevance tier, adverse effect mentions, adherence-related content, information-seeking behavior, specific statins, and subreddit category). Absence of a mention was coded as “not reported.” Diagnostics included variance inflation factors and influential observation checks. Results are reported as odds ratios with 95% CIs; 2-sided P<.05 indicated significance.
Ethical Considerations
This study analyzed publicly accessible Reddit posts and comments as secondary data and involved no direct contact, interaction, intervention, or attempt to reidentify users. Informed consent was not obtained because the study used publicly available online content, the investigators did not interact with users, and obtaining consent from all posters was not feasible. No compensation was provided because no participants were recruited or contacted. The study was exempt from institutional review board approval and aligned with the Association of Internet Researchers’ Internet Research: Ethical Guidelines 3.0 [].
Results
Dataset Characteristics and Distribution
After systematic filtering, the final dataset comprised 5328 discussions contributed by 4832 unique users across multiple Reddit communities. The most frequently represented forum was r/Cholesterol (n=2722, 51.1%). Nearly half of the discussions (n=2552, 47.9%) referenced a specific statin. The most commonly mentioned agents were rosuvastatin (n=1276, 23.9%) and atorvastatin (n=1013, 19%). Dataset characteristics and statin mentions are summarized in .
| Characteristic and category | Mentions, n (%) | ||
| Document type | |||
| 1661 (31.2) | |||
| 3667 (68.8) | |||
| Source community | |||
| 2722 (51.1) | |||
| 576 (10.8) | |||
| 300 (5.6) | |||
| 92 (1.7) | |||
| 87 (1.6) | |||
| 82 (1.5) | |||
| 46 (0.9) | |||
| 1423 (26.7) | |||
| Primary discussion type | |||
| 3210 (60.2) | |||
| 882 (16.6) | |||
| 734 (13.8) | |||
| 502 (9.4) | |||
| Mention of a specific statin | |||
| 2552 (47.9) | |||
| 2776 (52.1) | |||
| Most frequent statins | |||
| 1276 (23.9) | |||
| 1013 (19) | |||
| 279 (5.2) | |||
| 211 (4) | |||
aGeneric and brand name mentions were combined.
Thematic Analysis and Primary Discussion Topics
Thematic analysis showed that discussions most commonly focused on treatment effectiveness (3311/14,007, 23.6% of thematic mentions), followed by safety and tolerability concerns (n=2366, 16.9%) and alternative lifestyle interventions (n=2081, 14.9%; ). Privacy-preserving, paraphrased exemplar posts (1‐2 per major theme) and the corresponding LLM theme classifications are provided in .

Sentiment Analysis
Overall sentiment was neutral in 34% of discussions, negative in 30.9%, and positive in 16.9%, with the remainder classified as mixed or unclear. Sentiment distributions varied significantly by community type (χ8²=335.7; P<.001). The proportion of discussions with negative sentiment was substantially higher in medical advice–seeking forums (eg, r/AskDocs: 77.3%) than in lifestyle-oriented forums (eg, r/Biohackers: 9.1%).
Clinical Relevance Assessment
Among 5328 discussions, using guideline-informed criteria, 12.6% (n=672) of discussions were classified as high clinical relevance, 22.4% (n=1193) as medium relevance, and 65% (n=3463) as low relevance. High-relevance content was most frequent in r/stroke (40/46, 87.0%), r/HeartAttack (41/87, 47.1%), r/diabetes_t2 (29/82, 35.4%), and r/AskDocs (82/300, 27.3%). Clinical information elements were reported as follows: laboratory values (n=2048, 38.4%), cardiovascular events (n=362, 6.8%), family history (n=908, 17%), and lifestyle factors (n=2616, 49.1%).
Adverse Effect Reports and Safety Concerns
Among 5328 discussions, adverse effects were reported in 31.9% (n=1697; 95% CI 30.7%‐33.2%) of discussions. Among discussions that referenced adverse effects, the most frequent included muscle pain (n=129, 7.6%) and fatigue (n=110, 6.5%), followed by cognitive symptoms such as “brain fog” (n=61, 3.6%). A detailed breakdown of adverse effects, including neuropsychiatric symptoms, is provided in . In 0.3% (n=17) of discussions, users explicitly stated that no adverse effects occurred.
Decision-Making Factors and Treatment Influences
Of 5328 discussions, the most frequently coded factors associated with statin-related decisions were laboratory results (n=2767, 51.9%), physician recommendations (n=2034, 38.2%), adverse effects (n=1593, 29.9%), and family history of cardiovascular disease (n=822, 15.4%). Additional factors included lifestyle modifications (n=1092, 20.5%), online research (n=573, 10.8%), cost (n=316, 5.9%), and insurance coverage (n=72, 1.4%). Genetic predisposition and clinical guideline recommendations were each coded in 0.5% (n=26 for both) of discussions.
Alternative Treatment Discussions
Among 5328 discussions, alternative treatments were coded in 46.6% (n=2485) of discussions. The most frequent nonpharmacologic alternatives were dietary changes (n=449, 8.4%) and exercise (n=406, 7.6%). Pharmacologic alternatives included ezetimibe (n=168, 3.2%), Repatha (n=84, 1.6%), and Zetia (n=70, 1.3%). Supplements included fish oil (n=60, 1.1%), coenzyme Q10 (n=57, 1.1%), and red yeast rice (n=53, 1%). Weight loss was coded in 1.4% (n=73) of discussions.
Information-Seeking Behaviors and Community Engagement
Among 5328 discussions, experience sharing was coded in 78.2% (n=4167) of discussions and advice seeking was coded in 46.1% (n=2456). Discussions focused on alternative treatments occurred in 7.2% (n=386) of cases, while 5.3% (n=284) included questioning medication necessity. Advice seeking was most frequent in r/AskDocs (n=292, 97.3%).
Emotional Indicators and Adherence Issues
Of the 5328 discussions, emotional expressions were identified in 85.2% (n=4537) of discussions. The most frequently coded emotions were frustration (n=1705, 32%) and confusion (n=911, 17.1%). Medication adherence issues were present in 29.8% (n=1587) of the discussions, including discontinuation (n=522, 9.8%) and dose modification (n=437, 8.2%). Multiple codes per discussion were permitted; therefore, percentages may sum to more than 100%.
Mentions of LLM Tools in Reddit Discussions
Explicit mentions of LLM tools occurred in 23/5,328 (0.4%) discussions, totaling 38 mentions. Of the 38 mentions, ChatGPT accounted for 32 (84.2%), followed by other GPT variants (n=3, 7.9%) and Claude, Bard/Gemini, or DeepSeek (n=3, 7.9% combined). Mentions were coded as statin-related medical decision-making (n=2, 5.3%), information seeking (n=6, 15.8%), and general references (n=30, 78.9%). Discussions containing LLM mentions were most commonly observed in r/PeterAttia (6/23, 26.1%) and r/Cholesterol (5/23, 21.7%). Mentions increased from 6 in 2023 to 32 in January 2025 to May 2025.
Expert Validation Results
Expert validation of a stratified sample of 50 discussions showed a mean validation score of 4.67 (SD 0.74). Interrater agreement was substantial (Cohen κ=0.85; 95% CI 0.78-0.92). Expert feedback noted recurring coding challenges in distinguishing genetic risk factors (eg, lipoprotein[a]) from family history, as well as in identifying cardiovascular event–related content (eg, coronary artery calcium scores).
Statistical Associations
Sentiment distributions differed across communities (χ²=335.7; P<.001). Negative sentiment was more frequent in medical advice–seeking communities than in lifestyle-focused communities (eg, r/AskDocs: 77.3% vs r/Biohackers: 9.1%; P<.001). Adverse effect reporting also differed by community type (χ2²=124.6; P<.001). Adverse effect mentions and adherence issues were associated with negative sentiment (P<.001 for both). In adjusted multivariable analysis, factors associated with negative sentiment included adverse effect mentions (adjusted odds ratio [aOR] 3.42, 95% CI 2.89‐4.05; P<.001), posts in medical advice communities (aOR 2.76, 95% CI 2.31‐3.30; P<.001), adherence issues (aOR 2.18, 95% CI 1.84‐2.58; P<.001), and high clinical relevance content (aOR 1.47, 95% CI 1.18‐1.84; P=.001; ).
| Variable | Adjusted odds ratio (95% CI) | P value | |
| Adverse effect mentions | 3.42 (2.89‐4.05) | <.001 | |
| Posts in medical advice communities | 2.76 (2.31‐3.30) | <.001 | |
| Adherence issues | 2.18 (1.84‐2.58) | <.001 | |
| High clinical relevance content | 1.47 (1.18‐1.84) | .001 | |
Discussion
Main Findings
In this large-scale analysis of statin-related discussions on Reddit, patient discourse was shaped primarily by perceived adverse effects, uncertainty about benefits, and peer validation, with relatively limited reference to formal cardiovascular risk stratification. Although statins are strongly endorsed by clinical guidelines for ASCVD prevention [,], real-world patient narratives were frequently framed around experiential and emotional factors, including fear of long-term harm, symptom attribution, and ambivalence toward medical authority. This divergence between guideline-based evidence and patient-centered concerns helps explain persistently suboptimal statin adherence despite decades of robust trial data [].
Notably, fewer than 1 in 8 discussions met criteria for high clinical relevance based on American Heart Association, American College of Cardiology, and US Preventive Services Task Force risk thresholds [,]. Instead, most conversations focused on nonspecific symptoms, laboratory fluctuations, or lifestyle considerations, suggesting that engagement with statin therapy often occurs outside a formal risk-benefit calculus. Therefore, effective statin counseling may require attention to patients’ beliefs and concerns, beyond clinical risk factors alone.
The Adverse Effect Paradox: Perception vs Trial Evidence
Adverse effect concerns were prominent in our dataset (n=1697, 31.9% of discussions), which is not directly comparable to trial-based incidence estimates but may reflect a perception-evidence gap that shapes beliefs and adherence decisions. Evidence from blinded randomized trials and individual-participant meta-analyses suggests that the excess risk of muscle symptoms attributable to statins is small and that most muscle symptom reports under blinded conditions are not attributable to statin therapy []. This divergence is consistent with nocebo-related expectation and attribution mechanisms within the broader information environment. In the Anglo-Scandinavian Cardiac Outcomes Trial–Lipid-Lowering Arm, muscle-related adverse events were reported more frequently during the unblinded extension than during blinded treatment despite comparable exposure, supporting expectation-driven symptom attribution [].
Cognitive concerns (eg, “brain fog”) appeared in 3.6% (n=61) of discussions, although randomized trials have not demonstrated statin-attributable cognitive impairment []. Regulatory agencies have noted rare, generally reversible postmarketing reports, and the prominence of such concerns online may reflect expectation-driven symptom attribution []. Mechanistically, true statin myopathy occurs in a subset of patients, including those with SLCO1B1 variants or drug-drug interactions, but the high salience of adverse effects in online communities likely reflects both heterogeneity in susceptibility and selection effects (symptomatic users preferentially posting) [,]. Clinicians should anticipate that patients initiate statins within an information ecosystem that can magnify harm narratives. Proactive counseling, clear differentiation of evidence-based risks, and structured symptom assessment (eg, Statin-Associated Muscle Symptom Clinical Index) may reduce nocebo-driven discontinuation [,].
Emotional Burden as a Hidden Driver of Nonadherence
Beyond physical symptoms, statin discussions carried substantial emotional content (n=4537, 85.2%), with frustration (n=1705, 32%) and confusion (n=911, 17.1%) predominating. These patterns suggest that statin decisions may be shaped by emotional responses (eg, frustration and anxiety) in addition to objective risk appraisal. Negative emotion is consistently associated with poorer adherence across chronic disease contexts. A meta-analysis by DiMatteo et al [] found markedly higher nonadherence among patients with depression. Similarly, medication-specific emotional distress has been associated with treatment discontinuation, even after accounting for clinical depression [,]. In our dataset, adherence problems (n=1587, 29.8%) frequently co-occurred with adverse effect narratives and negative sentiment, suggesting a pathway in which perceived harms generate distress that undermines persistence. These emotional dimensions may be underelicited in routine care, where time-constrained visits often prioritize biomarker review and dose adjustment. Incorporating a brief assessment of medication-related distress into cardiovascular prevention workflows, paired with motivational interviewing, could identify patients at risk for emotionally driven discontinuation earlier in the treatment course.
Community as an Information Filter: The Ecology of Online Health Discourse
Sentiment and content varied substantially across Reddit communities. Negative sentiment was more prevalent in medical advice–seeking subreddits (eg, r/AskDocs: 77.3%) than in lifestyle-focused subreddits (eg, r/Biohackers: 9.1%). Condition-specific forums (eg, r/stroke and r/HeartAttack) contained the highest proportion of clinically relevant content, while dietary communities (eg, r/keto) more often reflected cholesterol skepticism that diverged from guideline framing [].
These patterns are consistent with community selection as an “information filter,” in which users preferentially encounter narratives aligned with prevailing community norms and their own concerns; similar dynamics have been described in other online health communities [,]. Such community-specific patterns may reinforce users’ existing perspectives. For example, dietary forums showed more skepticism toward statins, while condition-specific communities (eg, r/HeartAttack) contained more secondary prevention content. Therefore, community-specific outreach may be more effective than generic education. Strategies could include engaging trusted voices within skeptical forums, tailoring evidence presentation to community values, and directly addressing prevalent misconceptions (eg, “cholesterol myth” narratives), thereby complementing traditional patient counseling.
The Emerging Role of Artificial Intelligence in Patient Decision-Making
Explicit mentions of LLMs were uncommon (n=23, 0.4%) but increased from 6 in 2023 to 32 in early 2025, suggesting growing uptake. Users described using tools such as ChatGPT to interpret laboratory results, weigh treatment options, and prepare questions for clinical encounters. To the best of our knowledge, few studies have quantified artificial intelligence (AI) tool use within real-world cardiovascular treatment discussions. Prior research has evaluated LLM performance on medical queries [,], but implications for patient expectations and decision-making remain incompletely characterized. These findings raise questions about how AI-mediated information may shape patient perspectives before clinical visits, warranting further investigation and practical guidance for clinicians.
Limitations
This study has several limitations. First, Reddit API retrieval imposes sampling constraints. Search results are not uniformly sampled. Each keyword query is capped (≤1000 submissions), and ranking algorithms can overrepresent newer posts. This may yield a recency- and visibility-biased corpus. Accordingly, theme and sentiment frequencies should be interpreted as visibility-weighted patterns within the retrieved dataset rather than population-level prevalence for all statin-related content on Reddit; this may inflate high-salience topics (eg, adverse effects, discontinuation, and emotionally charged narratives) and complicate temporal comparisons. Second, Reddit’s pseudonymous structure prevents verification of demographics, diagnoses, lipid values, comorbidities, and outcomes. Users are unlikely to be representative of the broader statin-using population, limiting generalizability—particularly to older adults and those with lower digital access or health literacy. Third, self-reported social media content is subject to recall bias and selective posting, potentially overrepresenting unusual or negative experiences. Fourth, although we used expert-guided prompt development and validation, LLM-based extraction may misclassify nuanced language (eg, sarcasm or irony) or clinical attribution, which could affect domain-specific estimates. Evidence-quote requirements and human validation mitigate but do not eliminate this risk. Fifth, findings are limited to 1 platform and English-language content; discourse may differ across platforms and languages. Finally, this cross-sectional design cannot establish causality, assess within-person changes, or link discourse to clinical outcomes. Because nondisclosure in social media reflects “unreported” rather than missing at random, our conservative handling of unreported variables may underestimate clinical relevance and attenuate associations. We mitigate these limitations through transparent reporting, stratified expert validation, and emphasis on associations rather than causal claims. Future work should consider time-stratified sampling, multiplatform triangulation, and linkage to external clinical data (eg, registries, claims, or electronic health record–linked cohorts) where feasible.
Conclusions
This study highlights a substantial misalignment between guideline-based statin risk stratification and the lived experiences expressed by patients online. Across 5328 Reddit discussions, discourse centered on perceived adverse effects (n=1697, 31.9%), emotional distress (n=4537, 85.2%), and mentions of “natural” alternatives (n=2485, 46.6%), patterns that may help explain persistent challenges in long-term adherence. These findings suggest that effective cardiovascular prevention must go beyond information provision to address nocebo effects and treatment-related frustration that shape patients’ responses to therapy. Methodologically, our validated LLM-enabled pipeline captures nuanced dimensions of treatment experience (eg, ambivalence and peer influence) that are poorly measured by conventional surveillance approaches. As patients increasingly navigate AI-mediated information environments, evidence-informed engagement with digital spaces may support shared decision-making and improve long-term outcomes.
Acknowledgments
During manuscript preparation, the authors used ChatGPT (OpenAI) for language editing to improve clarity and readability. The tool was used only to assist with wording and expression. All artificial intelligence–generated suggestions were critically reviewed and revised by the authors, who take full responsibility for the accuracy, integrity, and originality of the final manuscript.
Funding
This work was supported by the National Institutes of Health (grant R00LM014097).
Data Availability
The datasets generated or analyzed during this study are not publicly available because sharing Reddit-derived data could increase the risk of user traceability and would be inconsistent with the study’s privacy protections.
Authors' Contributions
Conceptualization: SL, JL
Data curation: SL
Formal analysis: SL
Investigation: SL
Methodology: SL, JL
Software: SL
Writing—original draft: SL
Writing—review and editing: SL, JL
Conflicts of Interest
None declared.
Multimedia Appendix 4
Example posts with large language model–derived theme and sentiment classifications.
DOCX File, 17 KBMultimedia Appendix 5
Frequency of specific adverse effects mentioned in Reddit discussions on statin therapy.
DOCX File, 20 KBReferences
- Collins R, Reith C, Emberson J, et al. Interpretation of the evidence for the efficacy and safety of statin therapy. Lancet. Nov 19, 2016;388(10059):2532-2561. [CrossRef] [Medline]
- Grundy SM, Stone NJ, Bailey AL, et al. 2018 AHA/ACC/AACVPR/AAPA/ABC/ACPM/ADA/AGS/APhA/ASPC/NLA/PCNA guideline on the management of blood cholesterol: executive summary: a report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. J Am Coll Cardiol. Jun 25, 2019;73(24):3168-3209. [CrossRef] [Medline]
- Visseren FLJ, Mach F, Smulders YM, et al. 2021 ESC Guidelines on cardiovascular disease prevention in clinical practice. Eur J Prev Cardiol. Feb 19, 2022;29(1):5-115. [CrossRef] [Medline]
- Zhang H, Plutzky J, Skentzos S, et al. Discontinuation of statins in routine care settings: a cohort study. Ann Intern Med. Apr 2, 2013;158(7):526-534. [CrossRef] [Medline]
- Basios A, Markozannes G, Ntzani EE, et al. Prevalence and determinants of adherence to statin therapy: a systematic review and meta-analysis. Eur J Prev Cardiol. Dec 16, 2025:zwaf769. [CrossRef] [Medline]
- Vinogradova Y, Coupland C, Brindle P, Hippisley-Cox J. Discontinuation and restarting in patients on statin treatment: prospective open cohort study using a primary care database. BMJ. Jun 28, 2016;353:i3305. [CrossRef] [Medline]
- Anker SD, Agewall S, Borggrefe M, et al. The importance of patient-reported outcomes: a call for their comprehensive integration in cardiovascular clinical trials. Eur Heart J. Aug 7, 2014;35(30):2001-2009. [CrossRef] [Medline]
- Magnani JW, Mujahid MS, Aronow HD, et al. Health literacy and cardiovascular disease: fundamental relevance to primary and secondary prevention: a scientific statement from the American Heart Association. Circulation. Jul 10, 2018;138(2):e48-e74. [CrossRef] [Medline]
- Brinton EA. Understanding patient adherence and concerns with STatins and MedicatION Discussions With Physicians (ACTION): a survey on the patient perspective of dialogue with healthcare providers regarding statin therapy. Clin Cardiol. Jun 2018;41(6):710-720. [CrossRef] [Medline]
- Lardon J, Abdellaoui R, Bellet F, et al. Adverse drug reaction identification and extraction in social media: a scoping review. J Med Internet Res. Jul 10, 2015;17(7):e171. [CrossRef] [Medline]
- Nikfarjam A, Sarker A, O’Connor K, Ginn R, Gonzalez G. Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. J Am Med Inform Assoc. May 2015;22(3):671-681. [CrossRef] [Medline]
- Golder S, O’Connor K, Wang Y, Klein A, Gonzalez Hernandez G. The value of social media analysis for adverse events detection and pharmacovigilance: scoping review. JMIR Public Health Surveill. Sep 6, 2024;10:e59167. [CrossRef] [Medline]
- Somani S, van Buchem MM, Sarraju A, Hernandez-Boussard T, Rodriguez F. Artificial intelligence-enabled analysis of statin-related topics and sentiments on social media. JAMA Netw Open. Apr 3, 2023;6(4):e239747. [CrossRef] [Medline]
- Press - Reddit. Reddit. URL: https://www.redditinc.com/press [Accessed 2025-05-20]
- Rupert DJ, Gard Read J, Amoozegar JB, et al. Peer-generated health information: the role of online communities in patient and caregiver health decisions. J Health Commun. Nov 2016;21(11):1187-1197. [CrossRef] [Medline]
- Golder S, O’Connor K, Hennessy S, Gross R, Gonzalez-Hernandez G. Assessment of beliefs and attitudes about statins posted on Twitter: a qualitative study. JAMA Netw Open. Jun 1, 2020;3(6):e208953. [CrossRef] [Medline]
- Moorhead SA, Hazlett DE, Harrison L, Carroll JK, Irwin A, Hoving C. A new dimension of health care: systematic review of the uses, benefits, and limitations of social media for health communication. J Med Internet Res. Apr 23, 2013;15(4):e85. [CrossRef] [Medline]
- Bedi S, Liu Y, Orr-Ewing L, et al. Testing and evaluation of health care applications of large language models: a systematic review. JAMA. Jan 28, 2025;333(4):319-328. [CrossRef] [Medline]
- Goglia D, Vega D. Structure and dynamics of growing networks of Reddit threads. Appl Netw Sci. 2024;9:48. [CrossRef]
- Golder S, Ahmed S, Norman G, Booth A. Attitudes toward the ethics of research using social media: a systematic review. J Med Internet Res. Jun 6, 2017;19(6):e195. [CrossRef] [Medline]
- Language identification. FastText. URL: https://fasttext.cc/docs/en/language-identification.html [Accessed 2025-05-20]
- Broder AZ. On the resemblance and containment of documents. Presented at: Compression and Complexity of SEQUENCES 1997; Jun 13, 1997. [CrossRef]
- Introducing GPT-4.1 in the API. OpenAI. 2025. URL: https://openai.com/index/gpt-4-1/ [Accessed 2025-04-14]
- Croskerry P. A universal model of diagnostic reasoning. Acad Med. Aug 2009;84(8):1022-1028. [CrossRef] [Medline]
- Grundy SM, Stone NJ, Bailey AL, et al. 2018 AHA/ACC/AACVPR/AAPA/ABC/ACPM/ADA/AGS/APhA/ASPC/NLA/PCNA guideline on the management of blood cholesterol: a report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. Circulation. Jun 18, 2019;139(25):e1082-e1143. [CrossRef] [Medline]
- US Preventive Services Task Force, Mangione CM, Barry MJ, et al. Statin use for the primary prevention of cardiovascular disease in adults: US Preventive Services Task Force recommendation statement. JAMA. Aug 23, 2022;328(8):746-753. [CrossRef] [Medline]
- Franzke AS, Bechmann A, Zimmer M, Ess C. Internet research: ethical guidelines 3.0. Association of Internet Researchers. 2020. URL: https://aoir.org/reports/ethics3.pdf [Accessed 2026-04-06]
- Ingersgaard MV, Helms Andersen T, Norgaard O, Grabowski D, Olesen K. Reasons for nonadherence to statins - a systematic review of reviews. Patient Prefer Adherence. 2020;14:675-691. [CrossRef] [Medline]
- Cholesterol Treatment Trialists’ Collaboration. Effect of statin therapy on muscle symptoms: an individual participant data meta-analysis of large-scale, randomised, double-blind trials. Lancet. Sep 10, 2022;400(10355):832-845. [CrossRef] [Medline]
- Gupta A, Thompson D, Whitehouse A, et al. Adverse events associated with unblinded, but not with blinded, statin therapy in the Anglo-Scandinavian Cardiac Outcomes Trial-Lipid-Lowering Arm (ASCOT-LLA): a randomised double-blind placebo-controlled trial and its non-randomised non-blind extension phase. Lancet. Jun 24, 2017;389(10088):2473-2481. [CrossRef] [Medline]
- Zhou Z, Ryan J, Ernst ME, et al. Effect of statin therapy on cognitive decline and incident dementia in older adults. J Am Coll Cardiol. Jun 29, 2021;77(25):3145-3156. [CrossRef] [Medline]
- Cholesterol-lowering drugs get labeling changes. U.S. Food and Drug Administration. 2015. URL: https://www.fda.gov/drugs/special-features/cholesterol-lowering-drugs-get-labeling-changes [Accessed 2025-11-10]
- Cooper-DeHoff RM, Niemi M, Ramsey LB, et al. The Clinical Pharmacogenetics Implementation Consortium Guideline for SLCO1B1, ABCG2, and CYP2C9 genotypes and statin-associated musculoskeletal symptoms. Clin Pharmacol Ther. May 2022;111(5):1007-1021. [CrossRef] [Medline]
- SEARCH Collaborative Group, Link E, Parish S, et al. SLCO1B1 variants and statin-induced myopathy--a genomewide study. N Engl J Med. Aug 21, 2008;359(8):789-799. [CrossRef] [Medline]
- Howard JP, Wood FA, Finegold JA, et al. Side effect patterns in a crossover trial of statin, placebo, and no treatment. J Am Coll Cardiol. Sep 21, 2021;78(12):1210-1222. [CrossRef] [Medline]
- Rosenson RS, Miller K, Bayliss M, et al. The Statin-Associated Muscle Symptom Clinical Index (SAMS-CI): revision for clinical use, content validation, and inter-rater reliability. Cardiovasc Drugs Ther. Apr 2017;31(2):179-186. [CrossRef] [Medline]
- DiMatteo MR, Lepper HS, Croghan TW. Depression is a risk factor for noncompliance with medical treatment: meta-analysis of the effects of anxiety and depression on patient adherence. Arch Intern Med. Jul 24, 2000;160(14):2101-2107. [CrossRef] [Medline]
- Hoogendoorn CJ, Krause-Steinrauf H, Uschner D, et al. Emotional distress predicts reduced type 2 diabetes treatment adherence in the Glycemia Reduction Approaches in Diabetes: a comparative effectiveness study (GRADE). Diabetes Care. Apr 1, 2024;47(4):629-637. [CrossRef] [Medline]
- Gonzalez JS, Shreck E, Psaros C, Safren SA. Distress and type 2 diabetes-treatment adherence: a mediating role for perceived control. Health Psychol. May 2015;34(5):505-513. [CrossRef] [Medline]
- Chou WYS, Oh A, Klein WMP. Addressing health-related misinformation on social media. JAMA. Dec 18, 2018;320(23):2417-2418. [CrossRef] [Medline]
- Bode L, Vraga EK. See something, say something: correction of global health misinformation on social media. Health Commun. Sep 2018;33(9):1131-1140. [CrossRef] [Medline]
- Wei Q, Yao Z, Cui Y, Wei B, Jin Z, Xu X. Evaluation of ChatGPT-generated medical responses: a systematic review and meta-analysis. J Biomed Inform. Mar 2024;151:104620. [CrossRef] [Medline]
- Liu M, Okuhara T, Chang X, et al. Performance of ChatGPT across different versions in medical licensing examinations worldwide: systematic review and meta-analysis. J Med Internet Res. Jul 25, 2024;26:e60807. [CrossRef] [Medline]
Abbreviations
| AI : artificial intelligence |
| aOR: adjusted odds ratio |
| API: application programming interface |
| ASCVD: atherosclerotic cardiovascular disease |
| LLM: large language model |
| PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses |
Edited by Amaryllis Mavragani; submitted 30.Sep.2025; peer-reviewed by Leon Wreyford, Yenan Zhu; final revised version received 02.Feb.2026; accepted 24.Mar.2026; published 10.Apr.2026.
Copyright© Siru Liu, Jialin Liu. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 10.Apr.2026.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

